Search CORE

29 research outputs found

Simultaneous Perturbation Algorithms for Batch Off-Policy Search

Author: Fonteneau Raphael
Prashanth L. A.
Publication venue
Publication date: 01/01/2014
Field of study

We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces. Given a batch collection of trajectories, we perform off-line policy evaluation using an algorithm similar to that by [Fonteneau et al., 2010]. Using this Monte-Carlo like policy evaluator, we perform policy search in a class of parameterized policies. We propose both first order policy gradient and second order policy Newton algorithms. All our algorithms incorporate simultaneous perturbation estimates for the gradient as well as the Hessian of the cost-to-go vector, since the latter is unknown and only biased estimates are available. We demonstrate their practicality on a simple 1-dimensional continuous state space problem

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Repository and Bibliography - Liège

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

Author: Ernst Damien
Fonteneau Raphael
François-Lavet Vincent
Publication venue
Publication date: 01/12/2015
Field of study

Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Author: Boigelot Bernard
Ernst Damien
Fonteneau Raphael
Louveaux Quentin
Publication venue
Publication date: 01/01/2012
Field of study

We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22]

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

Benchmarking for Bayesian Reinforcement Learning

Author: Castronovo Michael
Couetoux Adrien
Ernst Damien
Fonteneau Raphael
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 14/09/2015
Field of study

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but even though a few toy examples exist in the literature, there are still no extensive or rigorous benchmarks to compare them. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.Comment: 37 page

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Open Repository and Bibliography - Liège

FigShare

On overfitting and asymptotic bias in batch reinforcement learning with partial observability

Author: Ernst Damien
Fonteneau Raphael
Francois-Lavet Vincent
Pineau Joelle
Rabusseau Guillaume
Publication venue
Publication date: 06/02/2019
Field of study

This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.Comment: Accepted at the Journal of Artificial Intelligence Research (JAIR) - 31 page

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

9.糖尿病患者におけるグラム陰性桿菌敗血症の2症例(第585回千葉医学会例会・第1内科教室同門会例会)

Author: Adrien Couëtoux (2837570)
Damien Ernst (2837564)
Michael Castronovo (2837567)
Raphael Fonteneau (2837573)
Publication venue: 千葉医学会
Publication date
Field of study

<p>Offline computation cost Vs. Performance (inaccurate case).</p

FigShare

Generating informative trajectories by using bounds on the return of control policies

Author: A Statnikov
Damien Ernst
G Cawley
G Dror
I Guyon
Louis Wehenkel
Raphael Fonteneau
Susan A Murphy
V Lemaire
Publication venue
Publication date: 01/01/2010
Field of study

Abstract We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for computing bounds on the return of control policies from a set of trajectories. Keywords: reinforcement learning, optimal control, sampling strategies Introduction. Discrete-time optimal control problems arise in many fields such as finance, medicine, engineering as well as artificial intelligence. Whatever the techniques used for solving such problems, their performance is related to the amount of information available on the system dynamics and the reward function of the optimal control problem. In this paper, we consider settings in which information on the system dynamics must be inferred from trajectories and, furthermore, due to cost and time constraints, only a limited number of trajectories can be generated. We assume that a regularity structure -given in the form of Lipschitz continuity assumptions -exists on the system dynamics and the reward function. Under such assumptions, we exploit recently published methods for computing bounds on the return of control policies from a set of trajectorie

CiteSeerX